Mining the Web for IP Address Geolocations

نویسندگان

  • Chen Chen
  • Chuanxiong Guo
  • Yunxin Liu
  • Helen J. Wang
  • Qing Yu
  • Yongguang Zhang
چکیده

In this paper, we observe that many Web pages contain geolocation information (address, zipcode, and telephone area code) and many of these geolocation items are directly related to the locations of the IP addresses that host the Web pages. We then design Structon, a system that mines Web pages for IP address geolocations. In Structon, we first extract geolocation information from every crawled Web pages, we then devise a serial of information clustering, false-information filtering, error-correction, and location inferring algorithms to map IP addresses to geolocations. We have run our algorithms on top of a set of 74M Chinese Web pages, from which we are able to identify the geolocations for 8.2M IP addresses, which contain addresses for not only Web servers but also client hosts. We have verified our result with an IP address location table of a major Chinese ISP, the verification shows that the accuracy of Structon is 94.4% at province level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web Usage mining framework for Data Cleaning and IP address Identification

The World Wide Web is the most wide known information source that is easily available and searchable. It consists of billions of interconnected documents Web pages are authored by millions of people. Accesses made by various users to pages are recorded inside web logs. These log files exist in various formats. Because of increase in usage of web, size of web log files is increasing at a much fa...

متن کامل

Expert Discovery: A web mining approach

Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

Analysis of Web Logs and Web User in Web Mining

Log files contain information about User Name, IP Address, Time Stamp, Access Request, number of Bytes Transferred, Result Status, URL that Referred and User Agent. The log files are maintained by the web servers. By analysing these log files gives a neat idea about the user. This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their us...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007